Bootstrapping a Verb Lexicon for Biomedical Information Extraction
نویسندگان
چکیده
The extraction of information from texts requires resources that contain both syntactic and semantic properties of lexical units. As the use of language in specialized domains, such as biology, can be very different to the general domain, there is a need for domain-specific resources to ensure that the information extracted is as accurate as possible. We are building a large-scale lexical resource for the biology domain, providing information about predicateargument structure that has been bootstrapped from a biomedical corpus on the subject of E. Coli. The lexicon is currently focussed on verbs, and includes both automatically-extracted syntactic subcategorization frames, as well as semantic event frames that are based on annotation by domain experts. In addition, the lexicon contains manually-added explicit links between semantic and syntactic slots in corresponding frames. To our knowledge, this lexicon currently represents a unique resource within in the biomedical domain.
منابع مشابه
A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain
The BioLexicon is a standardised, reusable, lexical and conceptual resource suitable for advanced biomedical text mining. One of the unique features of the BioLexicon is the incorporation of rich syntactic and semantic patterns for a wide range of domain-relevant verbs, which have been acquired semiautomatically from biomedical corpora. Such types of information can be highly beneficial for inf...
متن کاملBootstrapping Biomedical Ontologies for Scientific Text using NELL
We describe an open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner) (Carlson et al., 2010), a system designed for extraction from Web text. NELL uses a coupled semi-supervised bootstrapping approach to learn new facts from text, given an initial ontology and a small number of “seeds” for each ontology category. In contrast to previous applicat...
متن کاملLearning Dictionaries for Information Extraction by Multi-Level Bootstrapping
Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual boo...
متن کاملLexicon Acquisition with and for Symbolic NLP-Systems – a Bootstrapping Approach
We present a method of applying a broad-coverage LFG grammar of German in the process of semi-automatic lexicon acquisition from corpora. The identification of corpus instances that illustrate a certain subcategorization frame uniquely is done by a comparison of the numbers of analyses the grammar assigns to the corpus instances, under the assumption of different hypothetical lexicon entries fo...
متن کامل1 Kidz in the ‘ Hood : Syntactic Bootstrapping and the Mental Lexicon
Recent findings and theorizing on child language acquisition suggest that the verb lexicon is built by an arm-over-arm procedure that necessarily constructs the clause-level grammar of the exposure language on the fly as it acquires individual items (Gleitman, 1990; Gillette, Gleitman, Gleitman, and Lederer, 1999). This active learning process is likely to play a causal role in determining the ...
متن کامل